Krzysztof Tomala - Explainable AI - Homework 4

Report

List of features with short descriptions

Age : Age of the patient

Sex : Sex of the patient

exang: exercise induced angina (1 = yes; 0 = no)

ca: number of major vessels visible in fluoroscopy (0-3)

cp : Chest Pain type chest pain type Value 1: typical angina Value 2: atypical angina Value 3: non-anginal pain Value 4: asymptomatic

trtbps : resting blood pressure (in mm Hg)

chol : cholestoral in mg/dl fetched via BMI sensor

fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)

rest_ecg : resting electrocardiographic results Value 0: normal Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria

thalach : maximum heart rate achieved

thal : Thalium Stress Test result

slope : the slope of the peak exercise ST segment (2 = upsloping; 1 = flat; 0 = downsloping)

oldpeak : ST depression induced by exercise relative to rest

Tasks

Then, calculate the what-if explanations of these predictions using Ceteris Paribus profiles (also called What-if plots), e.g. in Python: AIX360, Alibi dalex, PDPbox; in R: pdp, DALEX, ALEplot. *implement CP yourself for a potential bonus point

image

image

Interesting thing here is that for both observations the blood pressure that would minimize the rist of getting the heart attack is higher then the recommended (around 120).
It is especially visible for the second observation. I think model could kind of underestimate the importance of the resting blood pressure, because it is strongly correleted to the other variables.

Find two observations in the data set, such that they have different CP profiles. For example, model predictions are increasing with age for one observation and decreasing with age for another one. NOTE that you will need to/ have a model with interactions to observe such differences.

Such situation is happening for the observations I have mentiond above.
First observations is a woman, that is pretty healthy, so her rist of getting heart attack increases with age as we would expect.
In the second observations we have a rather unhealthy man, who at a not very old age have some bad examination results such as very elevated cholesterols. Such thing is pretty rare in younger people, so it might mean that his health problems are very serious, while at older age it could be just the sign of normal decrease in healthiness.

Compare CP, which is a local explanation, with PDP, which is a global explanation. *implement PDP yourself for a potential bonus point

image

Here we can see the global explanations. I think they are pretty similar to what we have seen in the the local explanations. We can see that differences betwwen maximum and minimum are pretty small on both plots.

The resting blood pressure plot is very suprising, since all the medical research says that high resting blood pressure is a bad sign. There are a few potential explanated to why this plot might look like this:

  • people with health problems often takes medications decreasing the resting blood pressure, which might bias the model to thing that people with low resting blood pressure are the one that are taking medications (there isn't any indication in the dataset wheather someone is using such medications)
  • having high blood pressure and some other problems is better and easier to cure then having low blood pressure and same problems
  • since there are a lot of correlated variables our model learns the wrong thing
  • output labels in dataset are reversed suggests that the data could be labeld reversly to what is described in the dataset describtion (1.0 for high risk, 0.0 for low risk), but it might also be some problem with a model related to having a lot of correlated variables.
Compare PDP between between at least two different models.

image

We can see that the plots look different then before. It is because logistic regression is a linear model.
The resting heart pressure have similar tendency as before (heart attack risk is decreasing with increasment in resting heart pressure).
For age the profile is strongly differetnt. Here we just get an increasing plot, while before we had much more complex relation.

CP and PDP implementations

In the report I am still using the implementations from dalex, because they look better. (The plots are labeld and there is nice grid in a back. It is pretty easy to add this in matplotlib, but I wanted to keep code very simple and straightforward.)

In [1]:
import matplotlib.pyplot as plt
def CP(model, observation, variables, minVals, maxVals):
    for i, variable in enumerate(variables):
        x = np.linspace(minVals[i],maxVals[i], num=75)
        observations = []
        predictions = []
        for j in x:
            o = observation
            o[[variable]] = j
            predictions.append(model.predict_proba(o)[:,1])

        predictions.append(model.predict_proba(o)[:,1])
        x = np.append(x, observation[variable])

        plt.plot(x, predictions)
        plt.show()
In [2]:
import copy
def PDP(model, X, variable, minVal, maxVal):
    predictions = []
    dataset = copy.deepcopy(X)
    dataset['age'] = 5
    x = np.linspace(minVal, maxVal, num=75)
    for i, point in enumerate(x):
        predictions.append([])
        dataset[variable] = point
        predictions[i].append(model.predict_proba(dataset)[:,1])    
    predictions = np.array(predictions)
    predictions = predictions.mean(axis=2)
    plt.plot(x, predictions)
    plt.show()

image

Appendix

In [3]:
import pandas as pd
import sklearn
from sklearn import ensemble
import dalex as dx
import lime
In [4]:
dataset = pd.read_csv('heart.csv')
dataset = pd.get_dummies(dataset)
dataset
Out[4]:
age sex cp trtbps chol fbs restecg thalachh exng oldpeak slp caa thall output
0 63 1 3 145 233 1 0 150 0 2.3 0 0 1 1
1 37 1 2 130 250 0 1 187 0 3.5 0 0 2 1
2 41 0 1 130 204 0 0 172 0 1.4 2 0 2 1
3 56 1 1 120 236 0 1 178 0 0.8 2 0 2 1
4 57 0 0 120 354 0 1 163 1 0.6 2 0 2 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
298 57 0 0 140 241 0 1 123 1 0.2 1 0 3 0
299 45 1 3 110 264 0 1 132 0 1.2 1 0 3 0
300 68 1 0 144 193 1 1 141 0 3.4 1 2 3 0
301 57 1 0 130 131 0 1 115 1 1.2 1 1 3 0
302 57 0 1 130 236 0 0 174 0 0.0 1 1 2 0

303 rows × 14 columns

In [5]:
features = dataset.drop(columns='output')

#fixing typo in data
features['thalach']=features['thalachh']
features = features.drop(columns='thalachh')

features['slope']=features['slp']
features = features.drop(columns='slp')

features['ca']=features['caa']
features = features.drop(columns='caa')

features = pd.get_dummies(features, columns=['cp', 'thall'])

features
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(features, dataset['output'], test_size=0.3, random_state=0)
X_train
Out[5]:
age sex trtbps chol fbs restecg exng oldpeak thalach slope ca cp_0 cp_1 cp_2 cp_3 thall_0 thall_1 thall_2 thall_3
137 62 1 128 208 1 0 0 0.0 140 2 0 0 1 0 0 0 0 1 0
106 69 1 160 234 1 0 0 0.1 131 1 1 0 0 0 1 0 0 1 0
284 61 1 140 207 0 0 1 1.9 138 2 1 1 0 0 0 0 0 0 1
44 39 1 140 321 0 0 0 0.0 182 2 0 0 0 1 0 0 0 1 0
139 64 1 128 263 0 1 1 0.2 105 1 1 1 0 0 0 0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
251 43 1 132 247 1 0 1 0.1 143 1 4 1 0 0 0 0 0 0 1
192 54 1 120 188 0 1 0 1.4 113 1 1 1 0 0 0 0 0 0 1
117 56 1 120 193 0 0 0 1.9 162 1 0 0 0 0 1 0 0 0 1
47 47 1 138 257 0 0 0 0.0 156 2 0 0 0 1 0 0 0 1 0
172 58 1 120 284 0 0 0 1.8 160 1 0 0 1 0 0 0 0 1 0

212 rows × 19 columns

In [6]:
forest = sklearn.ensemble.RandomForestClassifier()
forest.fit(X=X_train,y=y_train)
print(f'Accuracy: {sklearn.metrics.accuracy_score(y_test,forest.predict(X_test))}')
print(f'Recall: {sklearn.metrics.recall_score(y_test,forest.predict(X_test))}')
print(f'Precision: {sklearn.metrics.precision_score(y_test,forest.predict(X_test))}')

forest_accuracy = sklearn.metrics.accuracy_score(y_test,forest.predict(X_test))
forest_recall = sklearn.metrics.recall_score(y_test,forest.predict(X_test))
forest_precision = sklearn.metrics.precision_score(y_test,forest.predict(X_test))

print('\nResults on train dataset:')
print(f'Accuracy: {sklearn.metrics.accuracy_score(y_train,forest.predict(X_train))}')
print(f'Recall: {sklearn.metrics.recall_score(y_train,forest.predict(X_train))}')
print(f'Precision: {sklearn.metrics.precision_score(y_train,forest.predict(X_train))}')
Accuracy: 0.8131868131868132
Recall: 0.8723404255319149
Precision: 0.7884615384615384

Results on train dataset:
Accuracy: 1.0
Recall: 1.0
Precision: 1.0
In [7]:
obs = list(range(2))
import numpy as np
predict = lambda m, d: m.predict_proba(d)[:,1]

for i in obs:
    obs1 = X_test.iloc[obs[i]].to_numpy().reshape(1,-1)
    print(obs1)
    print(predict(forest, obs1))
    

X = X_test
y = y_test
explainer = dx.Explainer(forest, X_test, y_test, predict_function=predict, label="forest")
[[ 70.    1.  145.  174.    0.    1.    1.    2.6 125.    0.    0.    1.
    0.    0.    0.    0.    0.    0.    1. ]]
[0.2]
[[ 64.    1.  170.  227.    0.    0.    0.    0.6 155.    1.    0.    0.
    0.    0.    1.    0.    0.    0.    1. ]]
[0.65]
Preparation of a new explainer is initiated

  -> data              : 91 rows 19 cols
  -> target variable   : Parameter 'y' was a pandas.Series. Converted to a numpy.ndarray.
  -> target variable   : 91 values
  -> model_class       : sklearn.ensemble._forest.RandomForestClassifier (default)
  -> label             : forest
  -> predict function  : <function <lambda> at 0x7f104b9cf8b0> will be used
  -> predict function  : Accepts pandas.DataFrame and numpy.ndarray.
  -> predicted values  : min = 0.0, mean = 0.559, max = 1.0
  -> model type        : classification will be used (default)
  -> residual function : difference between y and yhat (default)
  -> residuals         : min = -0.86, mean = -0.043, max = 0.81
  -> model_info        : package sklearn

A new explainer has been created!
/home/krzysztof/anaconda3/lib/python3.9/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names
  warnings.warn(
/home/krzysztof/anaconda3/lib/python3.9/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names
  warnings.warn(
/home/krzysztof/anaconda3/lib/python3.9/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names
  warnings.warn(
In [8]:
explainer.model_performance(cutoff=y.mean())
Out[8]:
recall precision f1 accuracy auc
forest 0.87234 0.803922 0.836735 0.824176 0.9103
In [9]:
explainer.model_parts().result
Out[9]:
variable dropout_loss label
0 chol 0.085928 forest
1 thall_1 0.088636 forest
2 fbs 0.088781 forest
3 cp_1 0.088806 forest
4 _full_model_ 0.089700 forest
5 thall_0 0.089700 forest
6 restecg 0.092070 forest
7 cp_3 0.093254 forest
8 trtbps 0.093544 forest
9 sex 0.095019 forest
10 slope 0.095648 forest
11 cp_2 0.097897 forest
12 thall_3 0.098283 forest
13 age 0.098839 forest
14 exng 0.099057 forest
15 thall_2 0.102273 forest
16 thalach 0.103312 forest
17 oldpeak 0.107882 forest
18 cp_0 0.127732 forest
19 ca 0.139990 forest
20 _baseline_ 0.479594 forest
In [10]:
CP(forest, X.iloc[[0]], ['age', 'trtbps'], [min(X['age']), min(X['trtbps'])], [max(X['age']), max(X['trtbps'])])
In [11]:
PDP(forest, X=X, variable='age', minVal=min(X['age']), maxVal=max(X['age']))
In [12]:
cp = explainer.predict_profile(new_observation=X.iloc[[0]])
cp.plot(variables=["age", "trtbps"])
Calculating ceteris paribus: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 80.71it/s]
In [13]:
cp = explainer.predict_profile(new_observation=X.iloc[[11]])
cp.plot(variables=["age", "trtbps"])
Calculating ceteris paribus: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 75.59it/s]
In [14]:
dpd = explainer.model_profile()
dpd.plot(variables=["age", "trtbps"])
Calculating ceteris paribus: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 18.91it/s]
In [15]:
import sklearn.linear_model
model = sklearn.linear_model.LogisticRegression(max_iter=500)
model.fit(X=X_train,y=y_train)
print(f'Accuracy: {sklearn.metrics.accuracy_score(y_test,model.predict(X_test))}')
print(f'Recall: {sklearn.metrics.recall_score(y_test,model.predict(X_test))}')
print(f'Precision: {sklearn.metrics.precision_score(y_test,model.predict(X_test))}')

model_accuracy = sklearn.metrics.accuracy_score(y_test,model.predict(X_test))
model_recall = sklearn.metrics.recall_score(y_test,model.predict(X_test))
model_precision = sklearn.metrics.precision_score(y_test,model.predict(X_test))

print('\nResults on train dataset:')
print(f'Accuracy: {sklearn.metrics.accuracy_score(y_train,model.predict(X_train))}')
print(f'Recall: {sklearn.metrics.recall_score(y_train,model.predict(X_train))}')
print(f'Precision: {sklearn.metrics.precision_score(y_train,model.predict(X_train))}')

X = X_test
y = y_test
predict = lambda m, d: m.predict_proba(d)[:,1]
explainer = dx.Explainer(model, X_test, y_test, predict_function=predict, label="lr")
print(explainer.model_performance(cutoff=y.mean()))
cp = explainer.model_profile()
cp.plot(variables=["age", "trtbps"])
/home/krzysztof/anaconda3/lib/python3.9/site-packages/sklearn/base.py:450: UserWarning:

X does not have valid feature names, but LogisticRegression was fitted with feature names

Accuracy: 0.8241758241758241
Recall: 0.8936170212765957
Precision: 0.7924528301886793

Results on train dataset:
Accuracy: 0.8867924528301887
Recall: 0.940677966101695
Precision: 0.8671875
Preparation of a new explainer is initiated

  -> data              : 91 rows 19 cols
  -> target variable   : Parameter 'y' was a pandas.Series. Converted to a numpy.ndarray.
  -> target variable   : 91 values
  -> model_class       : sklearn.linear_model._logistic.LogisticRegression (default)
  -> label             : lr
  -> predict function  : <function <lambda> at 0x7f1048bc4550> will be used
  -> predict function  : Accepts pandas.DataFrame and numpy.ndarray.
  -> predicted values  : min = 0.00303, mean = 0.54, max = 0.994
  -> model type        : classification will be used (default)
  -> residual function : difference between y and yhat (default)
  -> residuals         : min = -0.959, mean = -0.0233, max = 0.945
  -> model_info        : package sklearn

A new explainer has been created!
<dalex.model_explanations._model_performance.object.ModelPerformance object at 0x7f104a6b9160>
Calculating ceteris paribus: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 215.95it/s]
In [ ]: